perf: chunk long string byte escaping#809
Draft
He-Pin wants to merge 1 commit intodatabricks:masterfrom
Draft
Conversation
Motivation: Split the JMH-positive long-string rendering piece out of databricks#776 without carrying over the broader Scala Native render-pipeline experiment. Modification: - Add CharSWAR.findFirstEscapeChar for byte arrays on JVM, JS, and Native. - Keep the existing UTF-8 byte array for long strings, but locate escape bytes and copy clean chunks with System.arraycopy. - Escape only the matching bytes inline. - Precompute the exact escaped output length before writing dirty strings so ByteBuilder does not grow repeatedly. Result: This keeps the change JDK17/JIT/GC friendly: straight byte-array loops, no internal JDK APIs, no extra temporary arrays beyond the existing UTF-8 encoding, and no regression on clean long strings.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation:
Split the JMH-positive, JDK17/JIT/GC-friendly long-string rendering piece out of #776. The original PR mixed renderer, stdlib, format, compareStrings, and Scala Native changes, and Native hyperfine was not clean enough to merge as one large PR.
Key Design Decision:
Keep this PR focused on byte-rendering long strings that contain JSON escapes. This PR does not include compareStrings, char materializer, stdlib asciiSafe/substr/join, or format char-array assembly changes from #776.
Modification:
CharSWAR.findFirstEscapeChar(byte[], from, to)on JVM, Scala.js, and Scala Native.BaseByteRenderer, keep the existing UTF-8 byte array for long strings, locate escape bytes, bulk-copy clean chunks withSystem.arraycopy, and escape only the matching bytes inline.ByteBuilderdoes not grow repeatedly.JDK17 / JIT / GC Notes:
System.arraycopy; no reflection or internal JDK APIs.HexFormat, is intentionally not used because per-control-char formatting would be a worse JIT/GC shape than the static hex table.Focused Target JMH + GC:
JMH ran with the project compiled at the JDK17 level using the current Mill toolchain. Command shape:
./mill --no-server bench.runJmh sjsonnet.bench.RegressionBenchmark.main -p path=... -wi 3 -i 5 -r 3s -w 2s -f 1 -prof gclarge_string_templatelarge_string_joinFull JMH + GC Sweep:
All 36 regression benchmark inputs were covered. The full sweep command used:
./mill --no-server bench.runJmh sjsonnet.bench.RegressionBenchmark.main -p path="$PATHS" -wi 3 -i 5 -r 2s -w 1s -f 1 -prof gc -rf jsonbench.07needs the same larger stack thatbench.runRegressionsnormally provides, so it was rerun separately with-jvmArgsAppend -Xss100m. The full sweep is a screening run, not a claim that this renderer-only PR improves unrelated stdlib/parser cases; several unrelated rows had obvious system/JIT outliers. There were no clear time regressions by JMH error interval overlap.assertionsbench.01bench.02bench.03bench.04bench.06bench.07bench.08bench.09gen_big_objectlarge_string_joinlarge_string_templaterealistic1realistic2base64base64Decodebase64DecodeBytesbase64_byte_arraybase64_stresscomparisoncomparison2escapeStringJsonfoldllstripCharsmanifestJsonExmanifestTomlExmanifestYamlDocmemberparseIntreverserstripCharsstripCharssubstrsetDiffsetIntersetUnionFocused Rechecks:
Rows that looked suspicious in the full sweep were rerun with longer settings. The only stable allocation concern from the raw table was
bench.09; a 3-fork rerun made it neutral/slightly lower.bench.06was also rerun because the raw full sweep showed a small allocation delta.bench.06bench.09Scala Native Hyperfine:
Native artifacts were built with
./mill --no-server 'sjsonnet.native[3.3.7].nativeLink'on both master and this branch. Each hyperfine command loops 20 CLI invocations, uses--warmup 5 --runs 60, and the table divides the reported mean/median back to per-invocation milliseconds.large_string_templatelarge_string_joinCorrectness Review:
visitLongStringis only used forStringvalues whenescapeUnicode = false, matching the existing ByteRenderer path.",\, or< 0x20cannot falsely match inside a non-ASCII code point.RenderUtils.escapeBytefallback.elemBuilder.arrafterensureLength, so a grow cannot leave a stale array reference.Rejected Splits From #776:
Format.scalachar-array assembly: not JMH-positive on current master.length/substr/asciiSafe/joingroup:substrregressed, so it should not be split out as-is.std.joinexact-capacity builder: allocation improved in one run, but no-prof JMH regressed.String.indexOfescape scan: tiny signal only, not enough for a separate PR.Verification:
./mill --no-server 'sjsonnet.jvm[3.3.7].compile'./mill --no-server 'sjsonnet.jvm[3.3.7].checkFormat'./mill --no-server 'sjsonnet.jvm[3.3.7].test'./mill --no-server 'sjsonnet.native[3.3.7].nativeLink'References:
3a9a492899420456070fb84eaa5b89f8b7dfe1bfed9af56139ad6a066910483104bae165cef53d16